{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "e9e3f209-1b47-41aa-bb33-d0e7b564203c",
   "metadata": {
    "height": 30
   },
   "source": [
    "# 部署知识库助手\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2896c72f-1aa0-4b93-aea6-45908a6e42a1",
   "metadata": {},
   "source": [
    "我们对知识库和LLM已经有了基本的理解，现在是将它们巧妙地融合并打造成一个富有视觉效果的界面的时候了。这样的界面不仅对操作更加便捷，还能便于与他人分享。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0c975542-100a-431f-bfdb-2e948fd1e360",
   "metadata": {
    "height": 30
   },
   "source": [
    "Streamlit 是一种快速便捷的方法，可以直接在 **Python 中通过友好的 Web 界面演示机器学习模型**。在本课程中，我们将学习*如何使用它为生成式人工智能应用程序构建用户界面*。在构建了机器学习模型后，如果你想构建一个 demo 给其他人看，也许是为了获得反馈并推动系统的改进，或者只是因为你觉得这个系统很酷，所以想演示一下：Streamlit 可以让您通过 Python 接口程序快速实现这一目标，而无需编写任何前端、网页或 JavaScript 代码。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "caa74cbe-96ed-4652-a761-8740615597ed",
   "metadata": {
    "height": 30
   },
   "source": [
    "## 一、Streamlit 简介"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bcf25020-22a5-435d-925a-26cbe71a5f59",
   "metadata": {},
   "source": [
    "\n",
    "`Streamlit` 是一个用于快速创建数据应用程序的开源 Python 库。它的设计目标是让数据科学家能够轻松地将数据分析和机器学习模型转化为具有交互性的 Web 应用程序，而无需深入了解 Web 开发。和常规 Web 框架，如 Flask/Django 的不同之处在于，它不需要你去编写任何客户端代码（HTML/CSS/JS），只需要编写普通的 Python 模块，就可以在很短的时间内创建美观并具备高度交互性的界面，从而快速生成数据分析或者机器学习的结果；另一方面，和那些只能通过拖拽生成的工具也不同的是，你仍然具有对代码的完整控制权。\n",
    "\n",
    "Streamlit 提供了一组简单而强大的基础模块，用于构建数据应用程序：\n",
    "\n",
    "- st.write()：这是最基本的模块之一，用于在应用程序中呈现文本、图像、表格等内容。\n",
    "\n",
    "- st.title()、st.header()、st.subheader()：这些模块用于添加标题、子标题和分组标题，以组织应用程序的布局。\n",
    "\n",
    "- st.text()、st.markdown()：用于添加文本内容，支持 Markdown 语法。\n",
    "\n",
    "- st.image()：用于添加图像到应用程序中。\n",
    "\n",
    "- st.dataframe()：用于呈现 Pandas 数据框。\n",
    "\n",
    "- st.table()：用于呈现简单的数据表格。\n",
    "\n",
    "- st.pyplot()、st.altair_chart()、st.plotly_chart()：用于呈现 Matplotlib、Altair 或 Plotly 绘制的图表。\n",
    "\n",
    "- st.selectbox()、st.multiselect()、st.slider()、st.text_input()：用于添加交互式小部件，允许用户在应用程序中进行选择、输入或滑动操作。\n",
    "\n",
    "- st.button()、st.checkbox()、st.radio()：用于添加按钮、复选框和单选按钮，以触发特定的操作。\n",
    "\n",
    "这些基础模块使得通过 Streamlit 能够轻松地构建交互式数据应用程序，并且在使用时可以根据需要进行组合和定制，更多内容请查看[官方文档](https://docs.streamlit.io/get-started)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9c3209b0",
   "metadata": {},
   "source": [
    "## 二、构建应用程序"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "62b23f9f",
   "metadata": {},
   "source": [
    "首先，创建一个新的 Python 文件并将其保存 streamlit_app.py在工作目录的根目录中"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cf5e3845",
   "metadata": {},
   "source": [
    "1. 导入必要的 Python 库。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "3c4172dd",
   "metadata": {},
   "outputs": [],
   "source": [
    "import streamlit as st\n",
    "from langchain_openai import ChatOpenAI"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "097d95cb",
   "metadata": {},
   "source": [
    "2. 创建应用程序的标题`st.title`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "562cd2c3",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2024-11-10 22:30:34.273 \n",
      "  \u001b[33m\u001b[1mWarning:\u001b[0m to view this Streamlit app on a browser, run it with the following\n",
      "  command:\n",
      "\n",
      "    streamlit run /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/ipykernel_launcher.py [ARGUMENTS]\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "DeltaGenerator()"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "st.title('🦜🔗 动手学大模型应用开发')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7d0465f6",
   "metadata": {},
   "source": [
    "3. 添加一个文本输入框，供用户输入其 OpenAI API 密钥"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "cfea4bd9",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2024-11-10 22:30:34.278 Session state does not function when running a script without `streamlit run`\n"
     ]
    }
   ],
   "source": [
    "openai_api_key = st.sidebar.text_input('OpenAI API Key', type='password')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9e882d27",
   "metadata": {},
   "source": [
    "4. 定义一个函数，使用用户密钥对 OpenAI API 进行身份验证、发送提示并获取 AI 生成的响应。该函数接受用户的提示作为参数，并使用`st.info`来在蓝色框中显示 AI 生成的响应"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "9759d289",
   "metadata": {},
   "outputs": [],
   "source": [
    "def generate_response(input_text):\n",
    "    llm = ChatOpenAI(temperature=0.7, openai_api_key=openai_api_key)\n",
    "    st.info(llm(input_text))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d7c18041",
   "metadata": {},
   "source": [
    "5. 最后，使用`st.form()`创建一个文本框（st.text_area()）供用户输入。当用户单击`Submit`时，`generate-response()`将使用用户的输入作为参数来调用该函数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "1ce9471c",
   "metadata": {},
   "outputs": [],
   "source": [
    "with st.form('my_form'):\n",
    "    text = st.text_area('Enter text:', 'What are the three key pieces of advice for learning how to code?')\n",
    "    submitted = st.form_submit_button('Submit')\n",
    "    if not openai_api_key.startswith('sk-'):\n",
    "        st.warning('Please enter your OpenAI API key!', icon='⚠')\n",
    "    if submitted and openai_api_key.startswith('sk-'):\n",
    "        generate_response(text)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3f024f5b",
   "metadata": {},
   "source": [
    "6. 保存当前的文件`streamlit_app.py`！"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "887beac1",
   "metadata": {},
   "source": [
    "7. 返回计算机的终端以运行该应用程序\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "25271c4d",
   "metadata": {},
   "outputs": [
    {
     "ename": "SyntaxError",
     "evalue": "invalid syntax (1817081337.py, line 1)",
     "output_type": "error",
     "traceback": [
      "\u001b[0;36m  Cell \u001b[0;32mIn[6], line 1\u001b[0;36m\u001b[0m\n\u001b[0;31m    streamlit run streamlit_app.py\u001b[0m\n\u001b[0m              ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n"
     ]
    }
   ],
   "source": [
    "streamlit run streamlit_app.py"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0a84b80b",
   "metadata": {},
   "source": [
    "结果展示如下：  \n",
    "  \n",
    "![](../../figures/streamlit_app.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e9c38d3b",
   "metadata": {},
   "source": [
    "但是当前只能进行单轮对话，我们对上述做些修改，通过使用 `st.session_state` 来存储对话历史，可以在用户与应用程序交互时保留整个对话的上下文。\n",
    "  \n",
    "![](../../figures/streamlit_app2.png)\n",
    "  \n",
    "具体代码如下："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "7914b7db",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Streamlit 应用程序界面\n",
    "def main():\n",
    "    st.title('🦜🔗 动手学大模型应用开发')\n",
    "    openai_api_key = st.sidebar.text_input('OpenAI API Key', type='password')\n",
    "\n",
    "    # 用于跟踪对话历史\n",
    "    if 'messages' not in st.session_state:\n",
    "        st.session_state.messages = []\n",
    "\n",
    "    messages = st.container(height=300)\n",
    "    if prompt := st.chat_input(\"Say something\"):\n",
    "        # 将用户输入添加到对话历史中\n",
    "        st.session_state.messages.append({\"role\": \"user\", \"text\": prompt})\n",
    "\n",
    "        # 调用 respond 函数获取回答\n",
    "        answer = generate_response(prompt, openai_api_key)\n",
    "        # 检查回答是否为 None\n",
    "        if answer is not None:\n",
    "            # 将LLM的回答添加到对话历史中\n",
    "            st.session_state.messages.append({\"role\": \"assistant\", \"text\": answer})\n",
    "\n",
    "        # 显示整个对话历史\n",
    "        for message in st.session_state.messages:\n",
    "            if message[\"role\"] == \"user\":\n",
    "                messages.chat_message(\"user\").write(message[\"text\"])\n",
    "            elif message[\"role\"] == \"assistant\":\n",
    "                messages.chat_message(\"assistant\").write(message[\"text\"])   "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "483c244a",
   "metadata": {},
   "source": [
    "## 三、添加检索问答"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6916ca5e",
   "metadata": {},
   "source": [
    "先将`2.构建检索问答链`部分的代码进行封装：\n",
    "- get_vectordb函数返回C3部分持久化后的向量知识库\n",
    "- get_chat_qa_chain函数返回调用带有历史记录的检索问答链后的结果\n",
    "- get_qa_chain函数返回调用不带有历史记录的检索问答链后的结果"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "5943aca4",
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_vectordb():\n",
    "    # 定义 Embeddings\n",
    "    embedding = ZhipuAIEmbeddings()\n",
    "    # 向量数据库持久化路径\n",
    "    persist_directory = '../C3 搭建知识库/data_base/vector_db/chroma'\n",
    "    # 加载数据库\n",
    "    vectordb = Chroma(\n",
    "        persist_directory=persist_directory,  # 允许我们将persist_directory目录保存到磁盘上\n",
    "        embedding_function=embedding\n",
    "    )\n",
    "    return vectordb\n",
    "\n",
    "#带有历史记录的问答链\n",
    "def get_chat_qa_chain(question:str,openai_api_key:str):\n",
    "    vectordb = get_vectordb()\n",
    "    llm = ChatOpenAI(model_name = \"gpt-3.5-turbo\", temperature = 0,openai_api_key = openai_api_key)\n",
    "    memory = ConversationBufferMemory(\n",
    "        memory_key=\"chat_history\",  # 与 prompt 的输入变量保持一致。\n",
    "        return_messages=True  # 将以消息列表的形式返回聊天记录，而不是单个字符串\n",
    "    )\n",
    "    retriever=vectordb.as_retriever()\n",
    "    qa = ConversationalRetrievalChain.from_llm(\n",
    "        llm,\n",
    "        retriever=retriever,\n",
    "        memory=memory\n",
    "    )\n",
    "    result = qa({\"question\": question})\n",
    "    return result['answer']\n",
    "\n",
    "#不带历史记录的问答链\n",
    "def get_qa_chain(question:str,openai_api_key:str):\n",
    "    vectordb = get_vectordb()\n",
    "    llm = ChatOpenAI(model_name = \"gpt-3.5-turbo\", temperature = 0,openai_api_key = openai_api_key)\n",
    "    template = \"\"\"使用以下上下文来回答最后的问题。如果你不知道答案，就说你不知道，不要试图编造答\n",
    "        案。最多使用三句话。尽量使答案简明扼要。总是在回答的最后说“谢谢你的提问！”。\n",
    "        {context}\n",
    "        问题: {question}\n",
    "        \"\"\"\n",
    "    QA_CHAIN_PROMPT = PromptTemplate(input_variables=[\"context\",\"question\"],\n",
    "                                 template=template)\n",
    "    qa_chain = RetrievalQA.from_chain_type(llm,\n",
    "                                       retriever=vectordb.as_retriever(),\n",
    "                                       return_source_documents=True,\n",
    "                                       chain_type_kwargs={\"prompt\":QA_CHAIN_PROMPT})\n",
    "    result = qa_chain({\"query\": question})\n",
    "    return result[\"result\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "073fce21",
   "metadata": {},
   "source": [
    "然后，添加一个单选按钮部件`st.radio`，选择进行问答的模式：\n",
    "- None：不使用检索问答的普通模式\n",
    "- qa_chain：不带历史记录的检索问答模式\n",
    "- chat_qa_chain：带历史记录的检索问答模式"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "37b8fa45",
   "metadata": {},
   "outputs": [],
   "source": [
    "selected_method = st.radio(\n",
    "        \"你想选择哪种模式进行对话？\",\n",
    "        [\"None\", \"qa_chain\", \"chat_qa_chain\"],\n",
    "        captions = [\"不使用检索问答的普通模式\", \"不带历史记录的检索问答模式\", \"带历史记录的检索问答模式\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9a5917f6",
   "metadata": {},
   "source": [
    "最终的效果如下：  \n",
    "  \n",
    "![](../../figures/streamlit_app3.png)\n",
    "\n",
    "进入页面，首先先输入OPEN_API_KEY（默认），然后点击单选按钮选择进行问答的模式，最后在输入框输入你的问题，按下回车即可！\n",
    "\n",
    "> 完整代码参考[streamlit_app.py](./streamlit_app.py)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f9a43de5",
   "metadata": {},
   "source": [
    "## 四、部署应用程序  \n",
    "\n",
    "要将应用程序部署到 Streamlit Cloud，请执行以下步骤：  \n",
    "  \n",
    "1. 为应用程序创建 GitHub 存储库。您的存储库应包含两个文件：  \n",
    "  \n",
    "   your-repository/  \n",
    "   ├── streamlit_app.py  \n",
    "   └── requirements.txt  \n",
    "  \n",
    "2. 转到 [Streamlit Community Cloud](http://share.streamlit.io/)，单击工作区中的`New app`按钮，然后指定存储库、分支和主文件路径。或者，您可以通过选择自定义子域来自定义应用程序的 URL\n",
    "  \n",
    "3. 点击`Deploy!`按钮  \n",
    "  \n",
    "您的应用程序现在将部署到 Streamlit Community Cloud，并且可以从世界各地访问！ 🌎  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c3c13465",
   "metadata": {},
   "source": [
    "我们的项目部署到这基本完成，为了方便进行演示进行了简化，还有很多可以进一步优化的地方，期待学习者们进行各种魔改！\n",
    "\n",
    "优化方向：\n",
    "- 界面中添加上传本地文档，建立向量数据库的功能\n",
    "- 添加多种LLM 与 embedding方法选择的按钮\n",
    "- 添加修改参数的按钮\n",
    "- 更多......"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
